stochastic variational inference
Incremental Variational Sparse Gaussian Process Regression
Recent work on scaling up Gaussian process regression (GPR) to large datasets has primarily focused on sparse GPR, which leverages a small set of basis functions to approximate the full Gaussian process during inference. However, the majority of these approaches are batch methods that operate on the entire training dataset at once, precluding the use of datasets that are streaming or too large to fit into memory. Although previous work has considered incrementally solving variational sparse GPR, most algorithms fail to update the basis functions and therefore perform suboptimally. We propose a novel incremental learning algorithm for variational sparse GPR based on stochastic mirror ascent of probability densities in reproducing kernel Hilbert space. This new formulation allows our algorithm to update basis functions online in accordance with the manifold structure of probability densities for fast convergence. We conduct several experiments and show that our proposed approach achieves better empirical performance in terms of prediction error than the recent state-of-the-art incremental solutions to variational sparse GPR.
Scaling Factorial Hidden Markov Models: Stochastic Variational Inference without Messages
Factorial Hidden Markov Models (FHMMs) are powerful models for sequential data but they do not scale well with long sequences. We propose a scalable inference and learning algorithm for FHMMs that draws on ideas from the stochastic variational inference, neural network and copula literatures. Unlike existing approaches, the proposed algorithm requires no message passing procedure among latent variables and can be distributed to a network of computers to speed up learning. Our experiments corroborate that the proposed algorithm does not introduce further approximation bias compared to the proven structured mean-field algorithm, and achieves better performance with long sequences and large FHMMs.
07cdfd23373b17c6b337251c22b7ea57-Reviews.html
First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. This paper proposes parsimonious triangular model (PTM), which constrains the O(K^3) parameter space of mixed-membership triangular model (MMTM) to O(K) for faster inference. Authors develop a stochastic variational inference algorithm for PTM and additional approximation tricks to make it further scalable. It is shown from synthetic dataset that the reduction of the number of variables may lead to stronger statistical power, and from real-world datasets that the proposed method is competitive with existing methods in terms of accuracy. Quality: PTM seems to be an interesting specialization of MMTM, but it is questionable what is the practical advantage of achieving good scalability in terms of K (the number of possible roles). To empirically evaluate the value of such a method, it is critical for us to answer how does it help if we can learn MMTM with large K? Since MMSB and MMTM are mixed-membership models, using small K may not be as troublesome as it is in single-membership models!
Export Reviews, Discussions, Author Feedback and Meta-Reviews
First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. Stochastic variational inference (SVI) requires careful selection of a step size. This paper proposes a Kalman filter to set the step size automatically. The authors show that standard Gaussian KF does not satisfy the Robbins Munro criteria (and performs badly). They propose to apply a KF based on T-distributions, and show that this gives better results than standard SVI.
Export Reviews, Discussions, Author Feedback and Meta-Reviews
First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. Summary The paper introduces a simple strategy to reduce the variance of gradients in stochastic variational inference methods. Variance reduction is achieved by storing the last L data-point's contribution to the approximated/stochastic gradient and averaging these values. There exists a bias variance trade off: variance reduction comes at the cost of increased bias in the gradient estimates. The bias-variance tradeoff can be controlled by varying the sliding window size L. Also this strategy requires storing the last L data-point gradient contributions which can be significant.
Stochastic variational inference for hidden Markov models
Nick Foti, Jason Xu, Dillon Laird, Emily Fox
V ariational inference algorithms have proven successful for Bayesian analysis in large data settings, with recent advances using stochastic variational inference (SVI). However, such methods have largely been studied in independent or exchangeable data settings. We develop an SVI algorithm to learn the parameters of hidden Markov models (HMMs) in a time-dependent data setting. The challenge in applying stochastic optimization in this setting arises from dependencies in the chain, which must be broken to consider minibatches of observations. We propose an algorithm that harnesses the memory decay of the chain to adaptively bound errors arising from edge effects. We demonstrate the effectiveness of our algorithm on synthetic experiments and a large genomics dataset where a batch algorithm is computationally infeasible.
A Filtering Approach to Stochastic Variational Inference
Stochastic variational inference (SVI) uses stochastic optimization to scale up Bayesian computation to massive data. We present an alternative perspective on SVI as approximate parallel coordinate ascent. SVI trades-off bias and variance to step close to the unknown true coordinate optimum given by batch variational Bayes (VB). We define a model to automate this process.